Cerebral Valley
Posts
webAI is putting private, on-device intelligence in every pocket 🌐

webAI is putting private, on-device intelligence in every pocket 🌐

Plus: CEO David Stout explains how they're running massive models on hardware you already own...

Ash Rathie
May 30, 2025

CV Deep Dive

Today we’re talking with David Stout, co-founder and CEO of webAI.

Founded in late 2019, webAI started with a bold thesis: the most valuable AI systems of the future wouldn’t live in centralized clouds—they’d live on devices people already own. Phones. Laptops. Embedded boards. Close to the data. Private. Offline-capable. And talking to one another like a mesh of intelligence.

David’s journey began on a farm in northern Michigan, bounced through business school and early AI labs, and eventually led to a fundamental question: Can we run serious AI locally, at scale, and without a cloud connection?

With a few roommates, David ported the YOLO-based Darknet model to an iPhone just to see if it would compile. That prototype kickstarted years of work on AI-native infrastructure—custom runtimes, device-level quantization, and a communication layer that lets models talk to each other across devices—and the founding of webAI.

Key Takeaways

Edge as the default: webAI enables large models to run on user-owned hardware—no hyperscaler required.
Context over scale: They believe superintelligence will emerge from millions of contextual models, not a single giant one.
Entropy-Weighted Quantization (EWQ): Their open-source technique dynamically quantizes models layer-by-layer to optimize for device performance without sacrificing accuracy.
Mission-critical traction: Deployed use cases span aerospace, precision medicine, robotics, and defense.
Tooling for builders: The Navigator and Companion platforms let teams create and deploy private AI without writing infra code.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with David 💬

David - welcome to Cerebral Valley! How did webAI begin?

I grew up on a farm in northern Michigan, then went to a small business school and bounced through a couple of universities before an AI lab finally hooked me in 2016—back when “AI” mostly meant hand-rolled machine-learning pipelines. I was fascinated by NLP and computer vision, but the idea that kept me up at night was: Could we run serious models on the hardware people already own?

My roommates and I ported Darknet—an early YOLO model—to an iPhone just to see if it would compile. It did, barely, and that small victory convinced us the future of AI could be personal and live directly on devices users own and control.

We officially incorporated on December 26, 2019, and called it webAI—not about being an internet AI startup, but about creating a true web of models. Millions of specialized AI experts, working together, cooperating, disagreeing, cross-checking—but ultimately solving problems together.

What’s the one-liner pitch for webAI?

We help enterprises build private, high-performance, high-accuracy AI models that run on hardware they own. If you want a model that works on a plane with no internet, or a healthcare tool that never leaves a secure perimeter, we’re the best partner for that.

Just finished our first round of Meta’s Llama 4 testing in Navigator. The numbers are even better than we expected.
Our webFrame technology on Apple Silicon is delivering:
- Maverick (unquantized): 13 tokens/second
- Maverick (4-bit): 52 tokens/second
These numbers are ONLY
— David Stout (@davidpstout)
2:26 PM • Apr 9, 2025

What’s the long-term vision behind the “web of models”?

We see the future of intelligence as driven by highly contextual, specialized models working together—rather than by scaling a single foundational model. It's not about one huge model, it's about coordinating many. That means building an edge-class runtime, a model-to-model communication layer, and an application layer that lets users and builders easily interact with all of it. That’s what Navigator and Companion do. The result is fast, private, low-latency AI that works wherever it’s needed.

Which sectors are leaning in first?

The ones where mistakes aren’t an option—defense, medicine, manufacturing, aerospace. These are use cases where models need to be contextual, accurate, and able to run offline. I'm especially excited about what we’re doing in robotics. That’s going to be huge. It’s a major use case for AI that needs to run locally but also talk to other models when needed. I think it’s going to be a sweet spot for us over the next couple years. Models that live on an assembly robot and only reach out when they need help—that's where edge AI shines.

We're proud to be named to the @CBinsights AI 100 - recognizing the most promising artificial intelligence startups of 2025!
This further validates our approach to private, local AI that keeps your data on your device.
Huge thanks to our team, partners, and community who made
— webAI (@thewebAI)
5:34 PM • Apr 28, 2025

Can you share a customer story?

One use case is aviation maintenance. A lot of value is lost when engines are sent off unnecessarily. We’re working to keep more planes in the air by enabling mechanics to use AI tools locally—on devices they already carry—to make better decisions on the spot. It saves time, money, and improves safety.

Why choose edge over private cloud?

It’s not just about privacy—it’s about physics and economics. Realistically, you can’t send huge amounts of multimodal data—like 32 terabytes—to the cloud, process it remotely, and still achieve real-time responsiveness. Bandwidth limitations, latency issues, and GPU costs quickly become prohibitive.

In practice, it’s often better and more cost-effective to run AI locally, directly where the data lives. For example, when we benchmarked inference performance on typical enterprise workloads, we found big advantages using edge hardware. We compared token-per-dollar efficiency between consumer-grade hardware (like a Mac Studio with 128GB RAM) and data-center GPUs (like an Nvidia H100). The Mac Studio setup delivered significantly higher efficiency—about 100 million tokens per dollar, versus around 12 million tokens per dollar on an H100-based system.

This doesn’t mean consumer hardware replaces data-center GPUs for every scenario. But for many real-world enterprise use cases, especially those sensitive to cost, latency, or privacy—like aviation maintenance in an aircraft hangar or diagnostics within secure healthcare environments—the edge setup delivers meaningful efficiency and speed advantages. The ability to deploy locally and economically at scale is the core value of edge AI, and that's why we believe it will ultimately win out over purely centralized approaches.

What technical challenges did you face building webAI's infrastructure?

There weren't many tools for what we wanted to do. CUDA was the only real option, but it doesn't realistically work when you try to bring massive models down to laptops or phones. We didn’t raise millions of dollars in 2019, so we couldn’t just lease GPUs—we had to make things work with less. That scarcity-driven mindset led directly to novel innovations and forced us to build a lot ourselves: new fundamentals, runtimes, and AI libraries written directly to hardware like shader cores and BN instruction sets.

One of our biggest contributions was entropy-weighted quantization (EWQ). EWQ profiles the device in real time and dynamically quantizes each layer—some layers at four bits, some at full precision. EWQ is our only open-source contribution so far, but it’s a micro-example of how resource constraints led to the internal innovations that now define webAI.

Are you working beyond LLMs?

Yes—90% of deployments are multimodal. Vision and language, sometimes audio. Retrieval-augmented generation is a major focus. Our systems are designed so that a model can take a photo, read from a knowledge base, and reason in context—all on one device. That’s the power of doing it locally.

Why build Companion, if you're an infra company?

Enterprises needed a way to use what they built—without needing SDKs or engineering support. With Navigator, they can build a model. With Companion, they can deploy it to everyone in the org with one click. And in June, we’re rolling out features that let external partners use those models securely, without ever touching the raw weights.

What’s the biggest misconception about edge AI?

People think it’s far off. It’s not. It’s here today, and in many cases, it’s cheaper and better than cloud. You don’t have to sacrifice privacy or performance—you can have both. We think of webAI as the factory of intelligence. We don’t sell the intelligence—we give customers the ability to build their own.

We're proud to announce our partnership with @Divergent3D to advance human-AI collaboration! 🎉
This partnership combines our distributed AI technology with Divergent's pioneering digital manufacturing capabilities to create:
- Smarter industrial robotics
- More responsive
— webAI (@thewebAI)
8:02 PM • Apr 2, 2025

Conclusion

Stay up to date on the latest with webAI, learn more here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

webAI is putting private, on-device intelligence in every pocket 🌐

Plus: CEO David Stout explains how they're running massive models on hardware you already own...

CV Deep Dive

Our Chat with David 💬

Conclusion

Join Slack | All Events | Jobs